8 research outputs found

    FM-index on GPU : a cooperative scheme to reduce memory footprint

    Get PDF
    The FM-index is a data structure which is seeing more and more pervasive use, in particular in the field of highthroughput bioinformatics. Algorithms based on it show a pseudo-random memory access pattern. As a consequence, they are usually bound by memory bandwidth rather than CPU usage. Naive GPU implementations are no exception. Here we show that the combination of a compact design of the FM-index and a thread-cooperative approach can be used to restore a proper balance. The resulting solution is less memory-bandwidth intensive, and allows full exploitation of the computational resources of the GPU across several GPU architectures

    Thread-cooperative, bit-parallel computation of Levenshtein distance on GPU

    Get PDF
    Approximate string matching is a very important problem in computational biology; it requires the fast computation of string distance as one of its essential components. Myers' bit-parallel algorithm improves the classical dynamic programming approach to Levenshtein distance computation, and offers competitive performance on CPUs. The main challenge when designing an efficient GPU implementation is to expose enough SIMD parallelism while at the same time keeping a relatively small working set for each thread. In this work we implement and optimise a CUDA version of Myers' algorithm suitable to be used as a building block for DNA sequence alignment. We achieve high efficiency by means of a cooperative parallelisation strategy for (1) very-long integer addition and shift operations, and (2) several simultaneous pattern matching tasks. In addition, we explore the performance impact obtained when using features specific to the Kepler architecture. Our results show an overall performance of the order of tera cells updates per second using a single high-end Nvidia GPU, and factor speedups in excess of 20 with respect to a sixteen-core, non-vectorised CPU implementation

    Optimitzaci贸 d'una aplicaci贸 bioinform脿tica d'aliniament de seq眉猫ncies executada en processadors many-core (GPUs)

    Get PDF
    Las herramientas de an谩lisis de secuencias gen贸micas permiten a los bi贸logos identificar y entender regiones fundamentales que tienen implicaci贸n en enfermedades gen茅ticas. Actualmente existe una necesidad de dotar al 谩mbito cient铆fico de herramientas de an谩lisis eficientes. Este proyecto lleva a cabo una caracterizaci贸n y an谩lisis del rendimiento de algoritmos utilizados en la comparaci贸n de secuencias gen贸micas completas, y ejecutadas en arquitecturas MultiCore y ManyCore. A partir del an谩lisis se eval煤a la idoneidad de este tipo de arquitecturas para resolver el problema de comparar secuencias gen贸micas. Finalmente se propone una serie de modificaciones en las implementaciones de estos algoritmos con el objetivo de mejorar el rendimiento.Les eines d'an脿lisi de seq眉猫ncies gen貌miques permeten als bi貌legs identificar i entendre regions fonamentals que tenen implicaci贸 en malalties gen猫tiques. Actualment hi ha una necessitat d'aportar a l'脿mbit cient铆fic eines d'an脿lisi eficients. Aquest projecte desenvolupa una caracteritzaci贸 i an脿lisi del rendiment d'algoritmes utilitzats en la comparaci贸 de seq眉猫ncies gen貌miques completes executades en arquitectures MultiCore i ManyCore. A partir de l'an脿lisi s'evalua la idone茂tat d'aquest tipus d'arquitectures per resoldre el problema de la comparaci贸 de seq眉猫ncies gen貌miques. Finalment es proposen una s猫rie de modificacions en les implementacions d'aquests algoritmes amb l'objectiu de millorar el rendiment.The analysis tools of the genomic sequence allow biologists to identify and understand the basic regions that are involved in genetic diseases. Nowadays there is the necessity to give the science efficiency analyse tools. This project makes a characterisation and analysis of the output in the algorithms used on the complete sequence comparison, performed on MultiCore and ManyCore architectures. From this analysis the suitability of this kind of architectures on the solution of the comparison gene sequence is evaluated. Finally a series of modifications for the implementations of these algorithms are proposed, to allow the output improvement

    Boosting the FM-index on the GPU : effective techniques to mitigate random memory access

    Get PDF
    The recent advent of high-throughput sequencing machines producing big amounts of short reads has boosted the interest in efficient string searching techniques. As of today, many mainstream sequence alignment software tools rely on a special data structure, called the FM-index, which allows for fast exact searches in large genomic references. However, such searches translate into a pseudo-random memory access pattern, thus making memory access the limiting factor of all computation-efficient implementations, both on CPUs and GPUs. Here we show that several strategies can be put in place to remove the memory bottleneck on the GPU: more compact indexes can be implemented by having more threads work cooperatively on larger memory blocks, and a k-step FM-index can be used to further reduce the number of memory accesses. The combination of those and other optimisations yields an implementation that is able to process about 2 Gbases of queries per second on our test platform, being about 8脳 faster than a comparable multi-core CPU version, and about 3脳 to 5脳 faster than the FM-index implementation on the GPU provided by the recently announced Nvidia NVBIO bioinformatics library

    Optimitzaci贸 d'una aplicaci贸 bioinform脿tica d'aliniament de seq眉猫ncies executada en processadors many-core (GPUs)

    No full text
    Las herramientas de an谩lisis de secuencias gen贸micas permiten a los bi贸logos identificar y entender regiones fundamentales que tienen implicaci贸n en enfermedades gen茅ticas. Actualmente existe una necesidad de dotar al 谩mbito cient铆fico de herramientas de an谩lisis eficientes. Este proyecto lleva a cabo una caracterizaci贸n y an谩lisis del rendimiento de algoritmos utilizados en la comparaci贸n de secuencias gen贸micas completas, y ejecutadas en arquitecturas MultiCore y ManyCore. A partir del an谩lisis se eval煤a la idoneidad de este tipo de arquitecturas para resolver el problema de comparar secuencias gen贸micas. Finalmente se propone una serie de modificaciones en las implementaciones de estos algoritmos con el objetivo de mejorar el rendimiento.Les eines d'an脿lisi de seq眉猫ncies gen貌miques permeten als bi貌legs identificar i entendre regions fonamentals que tenen implicaci贸 en malalties gen猫tiques. Actualment hi ha una necessitat d'aportar a l'脿mbit cient铆fic eines d'an脿lisi eficients. Aquest projecte desenvolupa una caracteritzaci贸 i an脿lisi del rendiment d'algoritmes utilitzats en la comparaci贸 de seq眉猫ncies gen貌miques completes executades en arquitectures MultiCore i ManyCore. A partir de l鈥檃n脿lisi s'evalua la idone茂tat d'aquest tipus d'arquitectures per resoldre el problema de la comparaci贸 de seq眉猫ncies gen貌miques. Finalment es proposen una s猫rie de modificacions en les implementacions d'aquests algoritmes amb l'objectiu de millorar el rendiment.The analysis tools of the genomic sequence allow biologists to identify and understand the basic regions that are involved in genetic diseases. Nowadays there is the necessity to give the science efficiency analyse tools. This project makes a characterisation and analysis of the output in the algorithms used on the complete sequence comparison, performed on MultiCore and ManyCore architectures. From this analysis the suitability of this kind of architectures on the solution of the comparison gene sequence is evaluated. Finally a series of modifications for the implementations of these algorithms are proposed, to allow the output improvement

    Thread-cooperative, bit-parallel computation of Levenshtein distance on GPU

    No full text
    Approximate string matching is a very important problem in computational biology; it requires the fast computation of string distance as one of its essential components. Myers' bit-parallel algorithm improves the classical dynamic programming approach to Levenshtein distance computation, and offers competitive performance on CPUs. The main challenge when designing an efficient GPU implementation is to expose enough SIMD parallelism while at the same time keeping a relatively small working set for each thread. In this work we implement and optimise a CUDA version of Myers' algorithm suitable to be used as a building block for DNA sequence alignment. We achieve high efficiency by means of a cooperative parallelisation strategy for (1) very-long integer addition and shift operations, and (2) several simultaneous pattern matching tasks. In addition, we explore the performance impact obtained when using features specific to the Kepler architecture. Our results show an overall performance of the order of tera cells updates per second using a single high-end Nvidia GPU, and factor speedups in excess of 20 with respect to a sixteen-core, non-vectorised CPU implementation

    FM-index on GPU : a cooperative scheme to reduce memory footprint

    No full text
    The FM-index is a data structure which is seeing more and more pervasive use, in particular in the field of highthroughput bioinformatics. Algorithms based on it show a pseudo-random memory access pattern. As a consequence, they are usually bound by memory bandwidth rather than CPU usage. Naive GPU implementations are no exception. Here we show that the combination of a compact design of the FM-index and a thread-cooperative approach can be used to restore a proper balance. The resulting solution is less memory-bandwidth intensive, and allows full exploitation of the computational resources of the GPU across several GPU architectures

    Boosting the FM-index on the GPU : effective techniques to mitigate random memory access

    No full text
    The recent advent of high-throughput sequencing machines producing big amounts of short reads has boosted the interest in efficient string searching techniques. As of today, many mainstream sequence alignment software tools rely on a special data structure, called the FM-index, which allows for fast exact searches in large genomic references. However, such searches translate into a pseudo-random memory access pattern, thus making memory access the limiting factor of all computation-efficient implementations, both on CPUs and GPUs. Here we show that several strategies can be put in place to remove the memory bottleneck on the GPU: more compact indexes can be implemented by having more threads work cooperatively on larger memory blocks, and a k-step FM-index can be used to further reduce the number of memory accesses. The combination of those and other optimisations yields an implementation that is able to process about 2 Gbases of queries per second on our test platform, being about 8脳 faster than a comparable multi-core CPU version, and about 3脳 to 5脳 faster than the FM-index implementation on the GPU provided by the recently announced Nvidia NVBIO bioinformatics library
    corecore